The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
translated by 谷歌翻译
Deep Convolutional Neural Networks (DCNNs) have exhibited impressive performance on image super-resolution tasks. However, these deep learning-based super-resolution methods perform poorly in real-world super-resolution tasks, where the paired high-resolution and low-resolution images are unavailable and the low-resolution images are degraded by complicated and unknown kernels. To break these limitations, we propose the Unsupervised Bi-directional Cycle Domain Transfer Learning-based Generative Adversarial Network (UBCDTL-GAN), which consists of an Unsupervised Bi-directional Cycle Domain Transfer Network (UBCDTN) and the Semantic Encoder guided Super Resolution Network (SESRN). First, the UBCDTN is able to produce an approximated real-like LR image through transferring the LR image from an artificially degraded domain to the real-world LR image domain. Second, the SESRN has the ability to super-resolve the approximated real-like LR image to a photo-realistic HR image. Extensive experiments on unpaired real-world image benchmark datasets demonstrate that the proposed method achieves superior performance compared to state-of-the-art methods.
translated by 谷歌翻译
Face super-resolution is a domain-specific image super-resolution, which aims to generate High-Resolution (HR) face images from their Low-Resolution (LR) counterparts. In this paper, we propose a novel face super-resolution method, namely Semantic Encoder guided Generative Adversarial Face Ultra-Resolution Network (SEGA-FURN) to ultra-resolve an unaligned tiny LR face image to its HR counterpart with multiple ultra-upscaling factors (e.g., 4x and 8x). The proposed network is composed of a novel semantic encoder that has the ability to capture the embedded semantics to guide adversarial learning and a novel generator that uses a hierarchical architecture named Residual in Internal Dense Block (RIDB). Moreover, we propose a joint discriminator which discriminates both image data and embedded semantics. The joint discriminator learns the joint probability distribution of the image space and latent space. We also use a Relativistic average Least Squares loss (RaLS) as the adversarial loss to alleviate the gradient vanishing problem and enhance the stability of the training procedure. Extensive experiments on large face datasets have proved that the proposed method can achieve superior super-resolution results and significantly outperform other state-of-the-art methods in both qualitative and quantitative comparisons.
translated by 谷歌翻译
随着方法的发展,反转主要分为两个步骤。第一步是图像嵌入,其中编码器或优化过程嵌入图像以获取相应的潜在代码。之后,第二步旨在完善反转和编辑结果,我们将其命名为“结果”。尽管第二步显着提高了忠诚度,但感知和编辑性几乎没有变化,深处取决于第一步中获得的反向潜在代码。因此,一个关键问题是在保留重建保真度的同时获得更好的感知和编辑性的潜在代码。在这项工作中,我们首先指出,这两个特征与合成分布的逆代码的对齐程度(或不对准)有关。然后,我们提出了潜在空间比对反转范式(LSAP),该范式由评估度量和解决方案组成。具体来说,我们引入了归一化样式空间($ \ Mathcal {s^n} $ space)和$ \ Mathcal {s^n} $ cosine距离(SNCD)以测量反转方法的不对准。由于我们提出的SNCD是可区分的,因此可以在基于编码器和基于优化的嵌入方法中进行优化,以执行均匀的解决方案。在各个域中进行的广泛实验表明,SNCD有效地反映了感知和编辑性,并且我们的对齐范式在两个步骤中都归档了最新的。代码可在https://github.com/caopulan/ganinverter上找到。
translated by 谷歌翻译
现有的LIDAR基金​​标记系统具有使用限制。特别是,利达塔格(Lidartag)需要特定的标记放置和基于图像的激光雷达基准标记,要求从一个角度对点云进行采样。结果,随着点云从多个角度采样,基准标记的检测仍然是一个未解决的问题。在这封信中,我们开发了一种新颖的算法来检测多视点云中的基准标记。提出的算法包括两个阶段。首先,感兴趣的区域(ROI)检测发现可能包含基准标记的点簇。具体而言,由于从空间的角度来看,从强度的角度提取ROI的方法是引入的,即从空间角度来看,标记是纸张或薄板的床单,与它们所连接的平面是不可区分的。其次,标记检测验证候选ROI是否包含基金标记,并输出有效ROI中标记的ID号和顶点位置。特别是,将ROI传输到预定义的中间平面,目的是采用球形投影以生成强度图像,然后通过强度图像完成标记检测。提供定性和定量实验结果以验证所提出的算法。代码和结果可在以下网址获得:https://github.com/york-sdcnlab/marker?detection-general
translated by 谷歌翻译
从生物力学的角度来看,秋千臂在通过更大的角动量控制空间通过更大的角动量控制空间来促进两体机器人的高度动态运动方面具有不可替代的作用。由于缺乏适当的运动控制策略,很少有双足机器人使用摇摆臂及其多个自由度的冗余特征来完美整合建模和控制。本文通过将两足机器人建模为飞轮弹簧载倒摆(F-SLIP)来提取挥杆臂的特征并使用全身控制器(WBC)来实现这些特征,并提出了建议,并提出了建议,也建议您提出,则本文提出了一种控制策略。一个评估系统,包括美国定义的敏捷性的三个方面,双皮亚机器人高度动态运动的稳定性和能耗。我们设计了几组仿真实验,并根据评估系统的紫色运动(东方紫能量上升)V1.0分析了摇臂的效果,这是一种旨在测试高爆炸性运动的两足机器人。结果表明,紫色的敏捷性增加了10%以上,稳定时间减少了两倍,并且引入挥杆臂后,能源消耗降低了20%以上。
translated by 谷歌翻译
动态面部表达识别(FER)数据库为情感计算和应用提供了重要的数据支持。但是,大多数FER数据库都用几个基本的相互排斥性类别注释,并且仅包含一种模式,例如视频。单调的标签和模式无法准确模仿人类的情绪并实现现实世界中的应用。在本文中,我们提出了MAFW,这是一个大型多模式复合情感数据库,野外有10,045个视频Audio剪辑。每个剪辑都有一个复合的情感类别和几个句子,这些句子描述了剪辑中受试者的情感行为。对于复合情绪注释,每个剪辑都被归类为11种广泛使用的情绪中的一个或多个,即愤怒,厌恶,恐惧,幸福,中立,悲伤,惊喜,蔑视,焦虑,焦虑,无助和失望。为了确保标签的高质量,我们通过预期最大化(EM)算法来滤除不可靠的注释,然后获得11个单标签情绪类别和32个多标签情绪类别。据我们所知,MAFW是第一个带有复合情感注释和与情感相关的字幕的野外多模式数据库。此外,我们还提出了一种新型的基于变压器的表达片段特征学习方法,以识别利用不同情绪和方式之间表达变化关系的复合情绪。在MAFW数据库上进行的广泛实验显示了所提出方法的优势,而不是其他最先进的方法对单型和多模式FER的优势。我们的MAFW数据库可从https://mafw-database.github.io/mafw公开获得。
translated by 谷歌翻译
样本分配在现代对象检测方法中起着重要的作用。但是,大多数现有的方法都依靠手动设计来分配正 /负样本,这些样本并未明确建立样本分配和对象检测性能之间的关系。在这项工作中,我们提出了一种基于高参数搜索的新型动态样本分配方案。我们首先将分配给每个地面真理的正样本的数量定义为超参数,并采用替代优化算法来得出最佳选择。然后,我们设计一个动态的样本分配过程,以动态选择每个训练迭代中的最佳阳性数量。实验表明,所得的HPS-DET在不同对象检测基线的基线上带来了改善的性能。此外,我们分析了在不同数据集之间和不同骨架之间转移的高参数可重复使用性,以进行对象检测,这表现出我们方法的优势和多功能性。
translated by 谷歌翻译
知识蒸馏(KD)可以有效地将知识从繁琐的网络(教师)转移到紧凑的网络(学生),在某些计算机视觉应用中证明了其优势。知识的表示对于知识转移和学生学习至关重要,这通常以手工制作的方式定义或直接使用中间功能。在本文中,我们建议在教师学生架构下为单像超级分辨率任务提出一种模型 - 不足的元知识蒸馏方法。它提供了一种更灵活,更准确的方法,可以通过知识代表网络(KRNET)的能力来帮助教师通过具有可学习参数的知识传输知识。为了提高知识表示对学生需求的看法能力,我们建议通过采用学生特征以及KRNET中的教师和学生之间的相关性来解决从中间产出到转移知识的转型过程。具体而言,生成纹理感知的动态内核,然后提取要改进的纹理特征,并将相应的教师指导分解为质地监督,以进一步促进高频细节的恢复质量。此外,KRNET以元学习方式进行了优化,以确保知识转移和学生学习有益于提高学生的重建质量。在各种单个图像超分辨率数据集上进行的实验表明,我们所提出的方法优于现有的定义知识表示相关的蒸馏方法,并且可以帮助超分辨率算法实现更好的重建质量,而无需引入任何推理复杂性。
translated by 谷歌翻译